AITopics | relative feedback

Collaborating Authors

relative feedback

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CueLearner: Bootstrapping and local policy adaptation from relative feedback

Schiavi, Giulio, Cramariuc, Andrei, Ott, Lionel, Siegwart, Roland

arXiv.org Artificial IntelligenceJul-8-2025

Human guidance has emerged as a powerful tool for enhancing reinforcement learning (RL). However, conventional forms of guidance such as demonstrations or binary scalar feedback can be challenging to collect or have low information content, motivating the exploration of other forms of human input. Among these, relative feedback (i.e., feedback on how to improve an action, such as "more to the left") offers a good balance between usability and information richness. Previous research has shown that relative feedback can be used to enhance policy search methods. However, these efforts have been limited to specific policy classes and use feedback inefficiently. In this work, we introduce a novel method to learn from relative feedback and combine it with off-policy reinforcement learning. Through evaluations on two sparse-reward tasks, we demonstrate our method can be used to improve the sample efficiency of reinforcement learning by guiding its exploration process. Additionally, we show it can adapt a policy to changes in the environment or the user's preferences. Finally, we demonstrate real-world applicability by employing our approach to learn a navigation policy in a sparse reward setting.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2507.0473

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.40)

Add feedback

Fusing Reward and Dueling Feedback in Stochastic Bandits

Wang, Xuchuang, Zeng, Qirun, Zuo, Jinhang, Liu, Xutong, Hajiesmaili, Mohammad, Lui, John C. S., Wierman, Adam

arXiv.org Artificial IntelligenceApr-23-2025

This paper investigates the fusion of absolute (reward) and relative (dueling) feedback in stochastic bandits, where both feedback types are gathered in each decision round. We derive a regret lower bound, demonstrating that an efficient algorithm may incur only the smaller among the reward and dueling-based regret for each individual arm. We propose two fusion approaches: (1) a simple elimination fusion algorithm that leverages both feedback types to explore all arms and unifies collected information by sharing a common candidate arm set, and (2) a decomposition fusion algorithm that selects the more effective feedback to explore the corresponding arms and randomly assigns one feedback type for exploration and the other for exploitation in each round. The elimination fusion experiences a suboptimal multiplicative term of the number of arms in regret due to the intrinsic suboptimality of dueling elimination. In contrast, the decomposition fusion achieves regret matching the lower bound up to a constant under a common assumption. Extensive experiments confirm the efficacy of our algorithms and theoretical results.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.15812

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

Conversational Dueling Bandits in Generalized Linear Models

Yang, Shuhua, Yuan, Hui, Zhang, Xiaoying, Wang, Mengdi, Zhang, Hong, Wang, Huazheng

arXiv.org Machine LearningJul-25-2024

Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce relative feedback-based conversations into conversational recommendation systems through the integration of dueling bandits in generalized linear models (GLM) and propose a novel conversational dueling bandit algorithm called ConDuel. Theoretical analyses of regret upper bounds and empirical validations on synthetic and real-world data underscore ConDuel's efficacy. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees, which further proves the applicability of the proposed framework.

algorithm, bandit, proceedings, (14 more...)

arXiv.org Machine Learning

doi: 10.1145/3637528.3671892

2407.18488

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(15 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Comparison-based Conversational Recommender System with Relative Bandit Feedback

Xie, Zhihui, Yu, Tong, Zhao, Canzhe, Li, Shuai

arXiv.org Artificial IntelligenceAug-21-2022

With the recent advances of conversational recommendations, the recommender system is able to actively and dynamically elicit user preference via conversational interactions. To achieve this, the system periodically queries users' preference on attributes and collects their feedback. However, most existing conversational recommender systems only enable the user to provide absolute feedback to the attributes. In practice, the absolute feedback is usually limited, as the users tend to provide biased feedback when expressing the preference. Instead, the user is often more inclined to express comparative preferences, since user preferences are inherently relative. To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system. The relative feedback, though more practical, is not easy to be incorporated since its feedback scale is always mismatched with users' absolute preferences. With effectively collecting and understanding the relative feedback from an interactive manner, we further propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method, compared to the existing bandit algorithms in the conversational recommender systems.

recommendation, recommender system, relative feedback, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3404835.3462920

2208.09837

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China > Shanghai > Shanghai (0.05)
North America > United States > New York > New York County > New York City (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Media > Film (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Kalloori

AAAI ConferencesFeb-8-2022, 11:20:16 GMT

algorithm, kalloori, relative feedback, (2 more...)

AAAI Conferences

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback

Modeling User Preferences Using Relative Feedback for Personalized Recommendations

Kalloori, Saikishore ( Swiss Federal Institute of Technology in Zurich ) | Li, Tianyu (Rakuten Institute of Technology)

AAAI ConferencesMay-16-2020

Recommender systems are widely developed to learn user preferences from their past history and make predictions on the unseen items a user may like. User preferences in the form of absolute preferences, such as user ratings or clicks are commonly used to model a user’s interest and generate recommendations. However, rating items is not the most natural mechanism that users use for making decisions in daily life. For instance, we do not rate t-shirts when we want to buy one. It is more likely that we will compare them one to one, and purchase the preferred one. In this work, we focus on relative feedback, which generates pairwise preferences as an alternative way to model user preferences and compute recommendations. In our scenario, each user is shown a set of item pairs and asked to compare them to indicate which item in the pair is more preferred. We propose a recommendation algorithm to predict a user’s relative preference for a given pairs of items and compute a personalised ranking of items. We demonstrate the effectiveness of our proposed algorithm in comparison with state-of-the-art relative feedback based recommendation approaches. Our experimental results reveal that the proposed algorithm is able to outperform the baseline algorithms on popular ranking-oriented evaluation metrics.

artificial intelligence, modeling user preference, personalized recommendation, (1 more...)

AAAI Conferences

The Thirty-Third International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback